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Abstract 

This paper addresses the problem of learning binary 
hash codes for large scale image search by proposing a 
novel hashing method based on deep neural network. The 
advantage of our deep model over previous deep model used 
in hashing is that our model contains necessary criteria for 
producing good codes such as similarity preserving, bal¬ 
ance and independence. Another advantage of our method 
is that instead of relaxing the binary constraint of codes 
during the learning process as most previous works, in this 
paper, by introducing the auxiliary variable, we reformulate 
the optimization into two sub-optimization steps allowing 
us to efficiently solve binary constraints without any relax¬ 
ation. 

The proposed method is also extended to the supervised 
hashing by leveraging the label information such that the 
learned binary codes preserve the pairwise label of inputs. 

The experimental results on three benchmark datasets 
show the proposed methods outperform state-of-the-art 
hashing methods. 

1. Introduction 

Large scale visual search has attracted attention be¬ 
cause of easy availability of huge amounts of data also its 
wide applications [3]. Two main difficulties when dealing 
with large scale visual search are efficient storage and fast 
searching. An attractive approach for handling those diffi¬ 
culties is binary hashing where each original high dimen¬ 
sional vector X G is mapped to a binary low dimen¬ 
sional vector b G where L D. The resulted binary 
vectors will allow the efficient storage. Furthermore, while 
the searching in original space costs 0{ND) where N is 
database size, the searching in binary space costs 0{NL) 
with much smaller constant factor. This is because the 
hardware can efficiently compute the distance between data 
points in binary space (e.g. using XOR operator) and the 
entire dataset {NL bits) can fit in the main memory. There 
is a wide range of hashing methods proposed in the litera¬ 
ture [8, 33]. They can be divided into two categories, i.e.. 


data-independent and data-dependent. 

Most methods in data-independent category rely on ran¬ 
dom projections for generating hash functions. The repre¬ 
sentatives for this category are Locality-Sensitive Hashing 
(LSH) [5] and its extensions which extend Euclidean dis¬ 
tance to other distances such as kemelized LHS [15, 28], 
LSH with Mahalanobis distance [16]. 

Instead of using random projections, data-dependent cat¬ 
egory uses available training data for learning hash func¬ 
tions in unsupervised or supervised way. The representa¬ 
tives for this category include unsupervised hashing such as 
Spectral Hashing [34], Iterative Quantization (ITQ) [6], K- 
means Hashing [9], Spherical Hashing [10], Isotropic Hash¬ 
ing [12] etc., and supervised hashing such as LDA Hash¬ 
ing [31], Minimal Loss Hashing [25, 26], ITQ-CCA [6], 
FastHash [18], Binary Reconstructive Embedding [14], etc. 

One of difficult problems in hashing is to design hash 
function which can capture nonlinear structures in input 
space. Most aforementioned methods assumed hashing 
functions as linear functions so they may not well capture 
the nonlinear manifold structure of inputs. Although sev¬ 
eral kernel-based hashing methods have proposed [20, 15, 
28, 7], they suffer from scalability problem. 

Another difficult problem in hashing is to deal with bi¬ 
nary constraint on codes. In general, the binary constraint 
imposed on the output of hash functions leads to mixed- 
integer optimization problem which is NP-hard. To han¬ 
dle with this difficulty, most aforementioned methods relax 
the constraint during learning process. With this relaxation, 
the continuous codes are learned first, they then are bina¬ 
rized (e.g. by thresholding or with an optimal rotation). 
This relaxation greatly simplifies the original binary con¬ 
straint problem and its solution is suboptimal, i.e., the bi¬ 
nary codes resulting of thresholded continuous codes is not 
necessary same as binary codes resulting by directly solving 
the thresholding in the learning process. 

1.1. Related work 

In order to better capture nonlinear manifold structure of 
inputs, there are few of hashing methods [29, 4, 2] relying 
on deep learning techniques. Semantic hashing [29] is the 


1 


first work using deep learning for hashing. Their model is 
formed by stacked of Restricted Boltzmann Machine and a 
pretraining step is required to train the model. In [2], the 
authors use linear autoencoder as hash functions seeking to 
reconstruct an input from the binary code produced by hid¬ 
den layer of the network. Because the model in [2] only 
uses shallow network (i.e. only one hidden layer) with lin¬ 
ear activation function, it may not well capture nonlinear 
structure of inputs. In [4], the authors use a deep neural net¬ 
work as hash functions. However, their unsupervised hash¬ 
ing method does not have the similarity preserving prop¬ 
erty that is not only similar inputs should likely have simi¬ 
lar binary codes but also different inputs should likely have 
different binary codes. The similarity preserving property 
has been indicated as an important criterion for the hashing 
methods [34]. 

In order to handle with the binary constraint, semantic 
hashing [29] and deep hashing [4] first solve in learning 
process the relaxed problem by discarding the constraint 
and then threshold the solved continuous solution, result¬ 
ing the binary solution. Opposite to [29, 4], linear binary 
autoencoder-based hashing [2] directly solves binary con¬ 
straint during learning process. They used an exhausted 
search (i.e., searching in 2^ solutions) for finding the best 
binary code which minimizes the objective function (the re¬ 
construction error). This may cause the training process 
time-consuming when large number of bits is used to en¬ 
code a sample. Recently, in supervised discrete hashing 
(SDH) [30], the authors proposed a new method named dis¬ 
crete cyclic coordinate descent which efficiently solves the 
binary constraint without the relaxation. By solving the bi¬ 
nary constraint bit by bit, they achieved an analytic solu¬ 
tion for the processed bit. This makes the training process 
very efficient. It is worth noting that the objective func¬ 
tion of SDH [30] is designed by basing on the assumption 
that the good hash codes are optimal for linear classifica¬ 
tion. This assumption may not be directly involved to the 
retrieval problem. 

1.2. Contribution 

In this work, we first propose a novel unsupervised hash¬ 
ing method based on deep learning techniques. By using 
deep neural network with nonlinear activation functions, 
our method can capture complex structure in inputs. Our 
objective function includes the criteria [34] for producing 
good binary codes such as similarity preserving, indepen¬ 
dent and balancing properties. This is different from [4] 
where only independent and balancing properties are con¬ 
sidered. Furthermore, instead of doing relaxation when 
dealing with the binary constraint as previous works [4], 
we directly solve the binary constraint during learning pro¬ 
cess, resulting binary codes of better quality. The main 
differences between our hashing method and recent deep 


Table 1. The difference between our method and deep learning- 
based unsupervised hashing [4, 2]. 



DH [4] 

BA [2] 

Ours 

Is model deep? 

Yes 

No 

Yes 

Similarity preserving? 

No 

Yes 

Yes 

Independence? 

Yes' 

No 

Yes 

Balance? 

Yes 

No 

Yes 

How to solve 
binary const.? 

Relaxation 

Exhausted 

search 

Closed- 

form 


learning-based unsupervised hashing Deep Hash (DH) [4] 
and linear Binary Autoencoder (BA) [2] are summarized in 
Table 1 . The compared criteria are: is network-model deep? 
Does the objective function consider the similarity preserv¬ 
ing/independent/balancing of binary codes? How are the 
binary constraint on codes solved in the learning process? 

After introducing the new method for unsupervised 
hashing, we then extend our method to supervised hash¬ 
ing by leveraging the label information such that the binary 
codes preserve the semantic (label) similarity between sam¬ 
ples. Our main contributions are summarized as follows. 

• We proposed a novel deep learning-based hashing 
method which allows to produce binary codes having 
expected properties such as similarity preserving, in¬ 
dependent and balancing. 

• We directly solve binary constraint during the learn¬ 
ing process. The idea is to adaptly use the regular¬ 
ization approach [22] and recent proposed method dis¬ 
crete cyclic coordinate descent [30]. 

• The proposed method is first evaluated in unsupervised 
hashing setting. After that, we extend it to supervised 
hashing setting by leveraging the label information. 

• The extensive results on three benchmark datasets 
show the improvement of proposed method over sev¬ 
eral state-of-the-art hashing methods. 

The remaining of this paper is organized as follows. Sec¬ 
tion 2 presents our proposed method for unsupervised hash¬ 
ing. Section 3 evaluates the proposed unsupervised hashing 
method. Section 4 presents our proposed method for super¬ 
vised hashing. Section 5 evaluates the proposed supervised 
hashing. Section 6 concludes the paper. 


'Although authors of Deep Hashing [4] considered the independent 
property in their objective function, they did the relaxation by putting the 
independent property on the weights of the network. It is different from us 
where the independent property is directly considered on the codes. 












2. Unsupervised Discrete Hashing with Deep 
Neural Network (UDH-DNN) 

2.1. Formulation of UDH-DNN 

Let X = be set of m training 

samples; each column of X corresponds to one sample. 
We target to learn the binary codes for each sample. Let 
B = {bi}™ S be binary code matrix of X; L is 

the number of desire bits to encode a sample. In our work, 
the hash functions are defined as a deep neural network hav¬ 
ing n layers (including input and output layers). 

Let Si be number of units in layer (; be activation 
function of layer 1; • • • , hm] G be 

output values of layer I (for clarifying in later sections, we 
use = X); G ]jsi+ixsi weight matrix con¬ 
necting layer I + 1 and layer 1; G be bias vector 

for units in layer I + 1- 

Our idea is to learn a deep neural network such that the 
sign of output values of layer n — 1 can be used as binary 
codes and those codes should give a good reconstruction 
of input. To achieve this goal, we choose to optimize the 
following objective function 

minj = llx — 

w,c 2m II ^ ^ ’ II 

+ ^E||w(')|f (1) 

1=1 

where lixm is a row vector having all elements equals to 
1. In our formulation (I), the binary code B is defined as 
B = 

The first term of the objective function (1) makes sure 
that the binary code B gives a good reconstruction error 
of X. It is worth noting that the reconstraction criterion 
does not directly measure the similarity preservation, but 
it has been indicated in deep learning-based hashing meth¬ 
ods [2, 29] that the hash function defined by the neural 
networks containing reconstruction criterion can capture 
the data manifolds in a smooth way and indirectly pre¬ 
serve the similarity, encouraging (dis)similar inputs have to 
(dis)similar codes. The second term is a regularization term 
that tends to decreases the magnitude of the weights, and 
helps to prevent the overfitting“. It is worth noting in (1) 
that if we replace by the objective 

function (1) can be seen as a deep autoencoder with linear 
decoder layer (i.e. the last layer n uses linear activation 
function). 

Equivalently, by introducing the auxiliary variable B, the 


^As noted by Ng [1], the regularization is not usually applied to the 
bias terms c. Applying the regularization to the bias usually makes only a 
small difference to the final network. 


objective function (1) can be rewritten as 
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min J = - 

w,c,B 2m 


X-w(”-^)B-c(”-^hi> 
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2 

( 2 ) 


B = (3) 

The benefit of introducing the auxiliary variable B is that 
we can decompose the difficult optimization problem (1) 
into two sub optimization problems where we can itera¬ 
tively solve the optimization by alternatingly optimizing 
with respect to (W, c) and B while holding the other fixed. 
The idea of using auxiliary variable was also used in [2] for 
learning binary codes, but [2] only solves for case where 
hash function is linear autoencoder. 

As mentioned in [34], a good binary code not only 
should have similarity preserving property but also should 
have independent and balancing properties. That is different 
bits are independent to each other and each bit has a 50% 
chance of being 1 or —1. So we add two more constraints 
(independence and balance) to problem (2). The new objec¬ 
tive function is defined as 


min J 

W,c,B 
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2m 


X - - c(”-^hixm 
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(4) 


B = (5) 

—BB^ = I (6) 

m 

-||Bl„xif = 0 (7) 

m 

Where I is identity matrix. The problem (4) under the con¬ 
straints is still NP hard and difficult to solve because of the 
discrete variable B. One way to handle with this difficulty 
is by relaxing the constraint (5) as B = With 

this approach, this binary solution is achieved by first relax¬ 
ing the binary codes to a continuous space and then post¬ 
processing, i.e. thresholding, the continuous solution. Most 
existing approach follow this relaxation such as Deep Hash¬ 
ing [4], Semantic Hashing [29], Spectral Hashing [34], An- 
chorGraph Hashing [21], Semi-Supervised Hashing [32], 
LDAHash [31], etc. This relaxation simplifies the origi¬ 
nal binary constraint problem and its solution is suboptimal, 
i.e., the binary codes resulting of thresholded continuous 
codes is not necessary same as codes resulting by directly 
solving the thresholding process in the optimization. 

In order to achieve binary codes of better quality, we 
should solve the binary constraint during the learning of the 















hash function. Inspired by the regularization methods [22], 
we rewrite (4) and constraints (5), (6), (7) as 


min J = - 

w,c,B 2m 
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s.t. 
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(9) 


( 10 ) 


( 11 ) 


The third term in (8) is to minimize the discretization 
error between the continuous code and the binary 

code B. It is shown in [22] that with sufficiently large A 2 , 
minimizing (8) under constraint (9) becomes close to the 
minimizing (4) under constraint (5). When A 2 is sufficiently 
large, the optimization process will result B « So 

we can rewrite constraints (6), (7) by constraints (10), (11). 

The recent work SDH [30] on supervised hashing also 
used idea of regularization method [22]. However, their 
work focused on supervised hashing; their formulation is 
based on the assumption that the resulted codes is good for 
linear classification; furthermore, they did not consider in¬ 
dependent and balancing properties of codes. They are dif¬ 
ferent from our work, focusing on unsupervised hashing, 
no assumption on codes, using deep neural network as hash 
function and considering independent and balancing prop¬ 
erties of codes. 

Instead of solving (8) under many constraints, using La¬ 
grange multipliers approach, we solve similar following 
problem 


2.2.1 (W, c) step 

When fixing B, the problem becomes unconstrained opti¬ 
mization. We used L — BFGS [19, 24] optimizer with 
backpropagation for solving it. The gradient of objective 
function J (12) w.r.t. different parameters are computed as 
follows 

-fAiW("-i) (14) 


dJ _ -1 
Let us define 

^("- 1 ) _ 




(15) 


2 A 3 / 1 


mV / 

f r^{n-l)\T _ jj(n-l) 
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+ - (16) 
mV / 


= n-2,--- ,2 
(17) 

where © denotes Hadamard product; = 

W(z-i)h('-i) + c('-i)lixm, / = 2, • • • ,n 
Then, VZ = n — 2, • • • , 1, we have 
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2.2.2 B step 


min J 

W,c,B 


s.t. 
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2.2. Optimization 

To solve (12) under constraint (13), we alternating opti¬ 
mize over (W, c) and B. 


When fixing (W, c), we can rewrite problem (12) as 


min J 

B 


X - W(”-i)B - cl"-i)lix„^ 


+A 2 


H©-i) - B 


2 


( 20 ) 


s.t. 

B G {-1,1}^''’" (21) 

Solving B is challenging because of binary constraints on 
B. Here we use recent proposed method discrete cyclic co¬ 
ordinate descent [30]. The advantage of this method is if 
we fix L — 1 rows of B and only solve for the remaining 
row, we can achieve a closed-form solution for that row. It 
means that we can iteratively solve B row by row. 

Let V = X - c©-i)lix„; Q = (W(”-i))^V -f 
A2H©“^1. For k = 1,---L, let be column of 
















































Algorithm 1 Unsupervised Discrete Hashing with Deep 
Neural Network (UDH-DNN)_ 

Input: 

X = training data; L: code length; maxAter: 

maximum iteration number; n: number of layers; 
of units of layers 2 ^ n (Note: number of units of layers n — 1 and 
n should equal to L and D, respectively.); Ai, A 2 , A 3 , A 4 . 

Output: 

Binary code B G R^^^ of training data X; parameters 
1: Initialize using ITQ [ 6 ] 

2: Initialize Initialize by getting the 

top S 2 eigenvectors from the covariance matrix of X. Initialize 
by getting the top eigenvectors from the covariance 
matrix of Initialize = ^DxL 

3: Compute (W,c)^qj with (W,c) step (Sec. 2.2.1), using B^q) 
fixed value and using initialized (at line 2 ) as start¬ 

ing point for L — BFGS. 

4: for f = 1 ^ maxJter do 

5: Compute B^^^ by iteratively learning row by row with B step 

(Sec. 2.2.2), using (W, c)('t_i) as fixed values. 

6 : Compute (W, c)^^) with (W, c) step (Sec. 2.2.1), using B^-^) as 

fixed value and using (W, c)('t_i) as starting point for L — BFGS. 

7: end for 

8 . Return '^(^rnaxAter) (AV, c) 


Wi the matrix W excluding w^; be col¬ 
umn of Q^; be row of B; Bi the matrix of B ex¬ 
cluding b^. We have closed-form for \y[ as 

bf = sgriiq^ - w^WiBi) (22) 

The proposed UDH-DNN method is summarized in Al¬ 
gorithm 1. In the Algorithm 1, Bjj) and (W,c)(() are val¬ 
ues of B and {W(*\ at iteration t. 

3. Evaluation of Unsupervised Discrete Hash¬ 
ing with Deep Neural Network 

This section presents results of UDH-DNN. We com¬ 
pare UDH-DNN with following state-of-the-art unsuper¬ 
vised hashing methods; Spectral Hashing (SH) [34], 
Iterative Quantization (ITQ) [ 6 ], Binary Autoencoder 
(BA) [2], Spherical Hashing (SPH) [10], K-means Hashing 
(KMH) [9]. For all compared methods, we use the codes 
and the suggested parameters provided by the authors. 

3.1. Dataset, implementation note, and evaluation 
protocol 

CIFAR-10 CIFAR-10 [13] contains 60,000 color images 
of 10 classes. Each image has size of 32 x 32. The train¬ 
ing set contains 50,000 images, and the testing set contains 
10,000 images. In this experiment, we ignore the class la¬ 
bels. As standardly done in the literature [ 6 , 2], we extract 
320-79 GIST features [27] from each image. 


MNIST The MNIST [17] dataset consists of 70,000 
handwritten digit images of 10 classes (labeled from 0 to 9). 
Each image has size of 28 x 28. The training set contains 
60,000 samples, and the test set contains 10,000 samples. In 
this experiment, we ignore the class labels. Each image was 
represented as a 784-79 gray-scale feature vector by using 
its intensity. 

SIFTIM SIETIM [11] dataset contains 128-79 SIFT vec¬ 
tors. This is standard dataset used for evaluating large scale 
approximate nearest neighbor search. There are IM vec¬ 
tors for indexing; lOOK vectors for training (separated from 
indexing set) and lOK vectors for testing. 

Implementation note In our deep model, we use n = 5 
layers (including input and output layer). The activation 
functions for layers 2 and 3 are sigmoid functions; for layers 
4 and 5 are linear functions. The parameters Ai, A 2 , A 3 and 
A 4 were empirically set as 10“^, 5 x 10“^, 10“^ and 10“®, 
respectively. The max iteration number maxjiter is set to 
10 . 

For the CIFAR-10 and MNIST datasets, the number of 
units in hidden layers 2,3,4 were empirically set as [90 —> 
20 ^ 8 ], [90 ^ 30 ^ 16], [120 ^ 50 ^ 32] and [160 ^ 
110 64] for the 8 , 16, 32 and 64 bits respectively. For the 

SIFTIM dataset, the number of units in hidden layers 2, 3,4 
were empirically set as [90 —^ 20 —^ 8 ], [90 —>■ 30 —>■ 16], 
[100 ^ 50 ^ 32] and [100 ^ 80 ^ 64] for the 8 , 16, 32 
and 64 bits respectively. 

Evaluation metric We follow standard setting widely 
used in unsupervised hashing [ 6 , 10, 9, 2] using Euclidean 
nearest neighbors to create ground truths for queries. Num¬ 
ber of ground truths are set as in [2]. For datasets CIFAR-10 
and MNIST, for each query, we use 50 its Euclidean nearest 
neighbors as ground truth. For large scale dataset SIFTIM, 
for each query, we use 10,000 its Euclidean nearest neigh¬ 
bors as ground truth. 

We used the following evaluation metrics [ 6 , 2] to mea¬ 
sure the performance of methods. 1 ) mean average preci¬ 
sion (mAP) which not only considers precision but also con¬ 
siders rank of retrieval results; 2) precision of Hamming ra¬ 
dius r (precision@r) which measure precision on retrieved 
images having Hamming distance to query < r (if no im¬ 
ages satisfy, we report zero precision). 

3.2. Retrieval results 

3.2.1 Results on CIFAR-10 dataset 

Figure 1 shows retrieval results of different methods with 
different code lengths L on CIFAR-10 dataset. 

In term of mAP, the proposed UDH-DNN achieves the 
best results for all code lengths. The improvement is more 
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Figure 1. Comparative evaluation on CIFAR-10 dataset. 1(a): mAP. 1(b) and 1(c): Precision when considering retrieved images with in 
Hamming distance 3 and 4, respectively. Number of ground truths for each query = 50. 


clear at high L. The mAP of UDH-DNN consistent outper¬ 
forms binary autoencoder (BA) [2] which is current state- 
of-the-art unsupervised hashing method. 

When precision of Hamming radius r is used, the follow¬ 
ing observations are consistent for both r = 3 and r = 4. 
The UDH-DNN is comparable to other methods at low L 
(i.e. L = 8,16). At L = 32, UDH-DNN significant outper¬ 
forms other methods. When L = 64, all methods decrease 
the precision. The reason is that many query images have no 
neighbors at a Hamming distance of r or less and we report 
zero precision for those cases. The precision of UDH-DNN 
is lower than some compared methods at L = 64. However, 
we note a larger variance: the highest precision is achieved 
by UDH-DNN at L = 32 for both r = 3 and r = 4 cases. 

Comparison with Deep Hashing (DH) [4] We also com¬ 
pare our UDH-DNN with the Deep Hashing (DH) [4]. Be¬ 
cause the implementation of DH is not available, we set up 
our experiments similar to [4] to make a fair comparison. 
We randomly sample 1,000 images, 100 per class, as testing 
set; the remaining 59,000 images are used as training set. 
Each image is represented by 512-D GIST descriptor [27]. 
The ground truths of queries are based on their class la¬ 
bels^. Similar to [4], we report comparative results in term 
of mAP at code lengths L — 16, 32, 64 and the precision 
at Hamming radius of r = 2 at code lengths L = 16, 32. 
We perform the experiments 10 times and report the aver¬ 
age performance. The comparative results are presented in 
the Table 2. It is clearly showed in Table 2 that the pro¬ 
posed UDH-DNN outperforms DH [4] at all code lengths, 
in both mAP and precision of Hamming radius. It is because 
the UDH-DNN contains all necessary criteria for producing 
good binary codes. Furthermore, instead of doing the relax¬ 
ation on the binary constraint when learning the network as 
DH [4], we directly solve the binary constraint during the 

^It is worth noting that in the evaluation of unsupervised hashing, in¬ 
stead of using class label as ground truths, most state-of-the-art meth¬ 
ods [ 6 , 10, 9, 2] use Euclidean nearest neighbors as ground truths for 
queries. 


Table 2. Comparison with Deep Hashing (DH) [4] at different code 
lengths on the CIFAR-10 dataset. The results of DH are obtained 
from corresponding paper. 


Method 

L = 16 

mAP 

L = 32 

L = 64 

Precisior 
L = 16 

= 2 

L = 32 

DH [4] 
UDH-DNN 

16.17 

16.83 

16.62 

17.52 

16.96 

18.02 

23.33 

24.97 

15.77 

22.20 


learning process. 

3.2.2 Results on MNIST dataset 

Figure 2 shows retrieval results of different methods with 
different code lengths L on MNIST dataset. 

The results are quite consistent with the results on the 
CIFAR-10 dataset. The proposed UDH-DNN achieves the 
best mAP for all code lengths. The mAP improvement is 
more clear at high L. 

When precision of Hamming radius r is used, all meth¬ 
ods achieve similar precision at low L {L — 8, 16). At 
L = 32, UDH-DNN outperforms other methods by a fair 
margin. For large L, i.e. L = 64, except for ITQ which 
slightly increase precision when r = 4, all methods de¬ 
crease the precision. The precision of UDH-DNN is lower 
than some compared methods at L = 64. However, it is 
worth noting that the highest precision is achieved by UDH- 
DNN (at L = 32). 

3.2.3 Results on SIFTIM dataset 

As computing mAP is slow on this large dataset, we con¬ 
sider top-10, 000 returned neighbors when computing mAP. 
Figure 3 shows retrieval results of different methods with 
different code lengths L on SIFTIM dataset. 

In term of mAP, the proposed UDH-DNN is outperform 
all compared methods. It is slightly better than the cur¬ 
rent state-of-the-art unsupervised hashing binary autoen¬ 
coder (BA) [2]. 

In term of precision of Hamming radius, the results of 
UDH-DNN are consistent to its results on CIFAR-10 and 






















Figure 2. Comparative evaluation on MNIST dataset. 1(a): mAP. 1(b) and 1(c): Precision when considering retrieved images with in 
Hamming distance 3 and 4, respectively. Number of ground truths for each query = 50. 





Figure 3. Comparative evaluation on SIFTIM dataset. 3(a): mAP. 3(b) and 3(c): Precision when considering retrieved images with in 
Hamming distance 3 and 4, respectively. Number of ground truths for each query = 10,000. 


MNIST. All methods achieve similar precision at low L 
{L = 8,16). At L = 64, precision of UDH-DNN is 
lower than some methods. However, the highest precision 
is achieved by UDH-DNN at L — 32 and it is much better 
than the competitors. 

4. Supervised Discrete Hashing with Deep Neu¬ 
ral Network (SDH-DNN) 

There are several approaches proposed to leverage the 
label information when learning binary codes in the super¬ 
vised hashing. In [31, 23], binary codes are learned such 
that they minimize Hamming distance between samples be¬ 
longing to same class, while maximizing the Hamming 
distance between samples belonging to different classes. 
In [30], the binary codes are learned such that they mini¬ 
mize the I 2 loss w.r.t. the ground truth labels. 

In this work, we adapt the approach proposed in kernel- 
based supervised hashing (KSH) [20] to leverage the label 
information. The main idea is to learn binary codes such 
that the Hamming distance between binary codes of sam¬ 
ples are high correlated with the pre-computed pairwise la¬ 
bel matrix. In the other words, the binary codes should pre¬ 
serve the semantic (label) similarity between samples. It 
worth noting that in KSH [20] the hash functions are linear 


and are defined in kernel space of inputs. The independent, 
balancing criteria are not considered in KSH [20]. 

In general, the network structure of SDH-DNN is similar 
to the proposed UDH-DNN, excepting that the last layer 
preserving reconstruction is removed. The layer n — 1 in 
UDH-DNN will become the last layer in SDH-DNN. The 
semantic preservation property in SDH-DNN is constrained 
on output of its last layer. 

4.1. Formulation of SDH-DNN 

Following KSH [20], we fist define the pairwise label 
matrix S as 

g _ f 1 if Xi and x_, are same class 

1 if Xi and Xj are not same class ^ 

The goal of learning process is to learn hash function which 
generating discriminative codes such that similar pairs can 
be perfectly distinguished from dissimilar pair by using 
Hamming distance in the code space. In the other words, 
the Hamming distance between learned binary codes should 
correlate with the matrix S. Formally, the binary codes B 
should satisfy 


min Q 


1 T, 

-B^B- S 

Lj 


(24) 
























Using the idea of regularization as the unsupervised hashing 
(Sec. 2), we integrate the above criterion to our model by 
solving the following constrained optimization 
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min J = - 

w,c,B 2m 


- S 

L 


■tEIK 


l=l 


2 A 
+ 


2 

2 m 




H(") - B 

.A 

2 m 


(25) 


s.t. 

B e {-1,1}^''’" (26) 

The main difference in formulation between the proposed 
UDH-DNN (12) and the proposed SDH-DNN (25) is 
that the reconstruction term which indirectly preserves the 
neighbor similarity in UDH-DNN (12) is replaced by the 
term preserving the semantic (label) similarity in SDH- 
DNN (25). 

4.2. Optimization 

To solve (25) under constraint (26), we alternating opti¬ 
mize over (W, c) and B. 


Algorithm 2 Supervised Discrete Hashing with Deep Neu- 
ral Network (SDH-DNN)_ 

Input: 

X = ^ training data; Y E training label 

vector; L: code length; maxJter: maximum iteration number; n: 
number of layers; {si}Y_ 2 ‘ number of units of layers 2 ^ n (Note: 
number of units of layer n should equal to L); ris : number of samples 
per class for computing pairwise label matrix S; Ai, A 2 , A 3 , A 4 . 
Output: 

Binary code B E of training data X and parameters 

1: Random select Us samples per class and compute pairwise label ma¬ 
trix S using (23). 

2: Initialize using ITQ [ 6 ] 

3: Initialize = Osj+ixl- Initialize by getting the 

top S 2 eigenvectors from the covariance matrix of X. Initialize 
by getting the top sz+i eigenvectors from the covariance 

matrix of 

4: Compute (W,c)(o) with (W,c) step (Sec. 4.2.1), using B^q) 
fixed values and using initialized {(at line 3) as start¬ 
ing point for L — BFGS. 

5: for t = 1 ^ maxjiter do 

6 : Compute with B step (Sec. 4.2.2), using (W,c)(j_i) as 

fixed values. 

7: Compute (W, c)^^) with (W, c) step (Sec. 4.2.1), using B(j) as 

fixed values and using (W, as sfarfing point for L — BFGS. 

8: end for 

9. Return and (AV, 


4.2.1 (W, c) step 

When fixing B, (25) becomes unconstrained optimization. 
We used L — BFGS [19] optimizer with backpropagation 
for solving it. The gradient of objective function J w.r.t. 
different parameters are computed as follows. 

Let 


4.2.2 B step 

When fixing (W, c), we can rewrite problem (25) as 


min J = 

B 


H(") - B 


2 


(31) 


aI’") = 

—H(-) (V + V^) + ^fH(")-B) 

mL ^ ' m \ ) 

-f— f 

m \m J 

+ ^ (27) 

mV / 


where V = - S. 


= n — 1, ■ ■ ■ ,2 


Let 

A(') = ©/W'(Z(')),V( 

(28) 

where © denotes Hadamard product; Z^*^ = 

^(z_i)H(i-i) c('-l)llx„^, ( = 2, • • • , n. 

\/l = n — 1, ■ ■ ■ , 1, we have 

= A('+i)(H('))'^-|-A iW(') (29) 


aw(') 


dJ 




mx 1 


(30) 


s.t. 

Be {-1,1}^^™ (32) 

It is easy to see that the solution for (31) under con¬ 
straint (32) is B = spn(H(”^). 

The proposed SDH-DNN method is summarized in Al¬ 
gorithm 2. In the Algorithm 2, B(() and (W, c)(t) are val¬ 
ues of B and {W^l, c^) at iteration t. 

5. Evaluation of Supervised Discrete Hashing 
with Deep Neural Network 

This section evaluates the proposed SDH-DNN method. 
The proposed SDH-DNN is compared against several 
state-of-the-art supervised hashing methods including Su¬ 
pervised Discrete Hashing (SDH) [30], ITQ-CCA [6], 
KSH [20], BRE [14]. For all compared methods, we use 
the codes and the suggested parameters provided by the au¬ 
thors. 
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Figure 4. Comparison between SDH-DNN and the state of the art on CIFAR-10 dataset. 4(a): mAP. 4(b) and 4(c): Precision when 
considering retrieved images with in Flamming distance 3 and 4, respectively. 



Figure 5. Comparison between SDH-DNN and the state of the art on MNIST dataset. 5(a): mAP. 5(b) and 5(c): Precision when considering 
retrieved images with in Hamming distance 3 and 4, respectively. 


5.1. Dataset, Implementation note and Evaluation 
protocol 

Dataset We evaluate the proposed methods on two widely 
used datasets: CIFAR-10 and MNIST. The description of 
these dataset is provided in section 3.1. 

Implementation note The network configuration is same 
as UDH-DNN excepting the final layer is removed. The 
values of parameters Ai, A 2 , A 3 and A 4 are empirically set 
as 10“^, 5, 1 and 10“"^, respectively. The max iteration 
number maxJter is set to 5. 

For ITQ_CCA [ 6 ] and SDH [30], all training samples 
are used for training. For SDH-DNN, KSH [20], BRE [14] 
which label information is leveraged by pairwise label ma¬ 
trix S, we randomly select 2,000 training samples from 
each class and use these selected samples as new training 
set. The pairwise label matrix S in SDH-DNN is imme¬ 
diately obtained by using (23) because the exact labels are 
available. 

Evaluation protocol Follow standard setting for evalu¬ 
ating supervised hashing methods [30, 6 ], we report the 
retrieval results in two metrics 1 ) mean average precision 
(mAP) and 2) precision of Hamming radius r (precision@r) 


which measure precision on retrieved images having Ham¬ 
ming distance to query < r (if no images satisfy, we report 
zero precision). As standardly done in the literature [30, 6 ], 
the ground truths are defined by the class labels from the 
datasets. 

5.2. Retrieval results 
5.2.1 Results on CIFAR-10 

Figure 4 shows comparative results on CIFAR-10 dataset. 
In term of mAP, we can clearly see that the proposed SDH- 
DNN outperforms all compared methods by a fair margin 
on all code lengths. The improvement of SDH-DNN over 
the current state-of-the-art supervised hashing SDH [30] is 
4-17%, 4-3.1%, 4-4.9% and 4-3.4% at 8 , 16, 32 and 64 
bits, respectively. The improvements of SDH-DNN over 
KSH [20] which also uses pairwise label matrix are 4-7.6%, 
4-6.2%, 4-5.9% and 4-5.3% at 8 , 16, 32 and 64 bits, respec¬ 
tively. 

In term of precision of Hamming radius, the proposed 
SDH-DNN clearly outperforms the compared methods at 
low code lengths, i.e., L = 8, 16. SDH [30] becomes com¬ 
parable with SDH-DNN when increasing the code lengths, 
i.e., L = 32,64. 



























5.2.2 Results on MNIST 

Figure 5 shows comparative results on MNIST dataset. In 
term of mAP, the proposed SDH-DNN outperforms the cur¬ 
rent state-of-the-art SDH -h13.9% at L = 8 bits. When L 
increases, SDH-DNN and SDH [30] achieve similar perfor¬ 
mance. In comparison with KSH [20], SDH-DNN signif¬ 
icantly outperforms KSH at all code lengths; the improve¬ 
ments are -h3%, -h4.9%, -h3% and -h3.2% at 8, 16, 32 and 
64 bits, respectively. 

In term of precision of Hamming radius, the SDH-DNN 
show a clearly improvement over SDH [30] when r = 3 and 
L = 8. At other settings, SDH-DNN and SDH [30] achieve 
similar performance. 

6. Conclusion 

In this paper, we propose two novel hashing meth¬ 
ods that are UDH-DNN for unsupervised hashing and 
SDH-DNN for supervised hashing for learning com¬ 
pact binary codes. Our methods include all necessary 
criteria for producing good binary codes such as simi¬ 
larity preserving, independent and balancing. Another 
advantage of proposed methods are that the binary con¬ 
straint on codes are directly solved during optimization 
without any relaxation. The experimental results on 
three benchmark datasets show the proposed methods 
compare favorably with state-of-the-art hashing methods. 
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